Shampoo: Preconditioned Stochastic Tensor Optimization

نویسندگان

  • Vineet Gupta
  • Tomer Koren
  • Yoram Singer
چکیده

Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-ofthe-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Although it involves a more complex update rule, Shampoo’s runtime per step is comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-rank tensor completion: a Riemannian manifold preconditioning approach

We propose a novel Riemannian manifold preconditioning approach for the tensor completion problem with rank constraint. A novel Riemannian metric or inner product is proposed that exploits the least-squares structure of the cost function and takes into account the structured symmetry that exists in Tucker decomposition. The specific metric allows to use the versatile framework of Riemannian opt...

متن کامل

Preconditioned Low-rank Riemannian Optimization for Linear Systems with Tensor Product Structure

The numerical solution of partial differential equations on high-dimensional domains gives rise to computationally challenging linear systems. When using standard discretization techniques, the size of the linear system grows exponentially with the number of dimensions, making the use of classic iterative solvers infeasible. During the last few years, low-rank tensor approaches have been develo...

متن کامل

Recurrent neural network training with preconditioned stochastic gradient descent

Recurrent neural networks (RNN), especially the ones requiring extremely long term memories, are difficult to training. Hence, they provide an ideal testbed for benchmarking the performance of optimization algorithms. This paper reports test results of a recently proposed preconditioned stochastic gradient descent (PSGD) algorithm on RNN training. We find that PSGD may outperform Hessian-free o...

متن کامل

Low Rank Solution of Unsteady Diffusion Equations with Stochastic Coefficients

We study the solution of linear systems resulting from the discreitization of unsteady diffusion equations with stochastic coefficients. In particular, we focus on those linear systems that are obtained using the so-called stochastic Galerkin finite element method (SGFEM). These linear systems are usually very large with Kronecker product structure and, thus, solving them can be both timeand co...

متن کامل

Fast tensor product solvers for optimization problems with fractional differential equations as constraints

Fractional differential equations have recently received much attention within computational mathematics and applied science, and their numerical treatment is an important research area as such equations pose substantial challenges to existing algorithms. An optimization problem with constraints given by fractional differential equations is considered, which in its discretized form leads to a h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.09568  شماره 

صفحات  -

تاریخ انتشار 2018